Skip to content

feat(grpc): implement continuous Watch streaming for health servicers#917

Open
V2arK wants to merge 11 commits intolightseekorg:mainfrom
V2arK:feat/grpc-watch-continuous-stream
Open

feat(grpc): implement continuous Watch streaming for health servicers#917
V2arK wants to merge 11 commits intolightseekorg:mainfrom
V2arK:feat/grpc-watch-continuous-stream

Conversation

@V2arK
Copy link
Copy Markdown
Contributor

@V2arK V2arK commented Mar 26, 2026

Description

Problem

SGLangHealthServicer.Watch() and VllmHealthServicer.Watch() yield a single response then close the stream. This violates the gRPC Health Checking Protocol, which requires Watch to be a long-lived server-streaming RPC that sends updates whenever the service's health status changes.

Additionally, SGLangHealthServicer.Watch() delegates to self.Check(), which calls context.set_code(NOT_FOUND) and context.set_details() for unknown services, polluting the streaming response context.

Follow-up from #885. Ref: vllm-project/vllm#38016.

Solution

Add HealthWatchMixin providing the Watch loop skeleton (poll + asyncio.Event for immediate shutdown wakeup, yield-on-change, cancel handling). Both servicers integrate the mixin and implement _compute_watch_status() and _is_shutting_down().

  • SGLang: sync status computation (dict lookup + scheduler responsiveness check)
  • vLLM: async status computation (await async_llm.check_health())

The mixin's _resolve_watch_status() bridge method auto-detects sync vs async implementations via asyncio.iscoroutine(), so each servicer uses its natural calling convention.

Spec deviation: for unknown services, the stream sends SERVICE_UNKNOWN once then exits (spec says keep open for dynamic registration, but smg services are statically defined).

Test Plan

cd grpc_servicer
pip install -e ".[test]"
pytest tests/ -xvs

Unit tests: 14/14 passed (macOS + x86 Linux)

# Test SGLang vLLM
1 Initial status sent immediately PASS PASS
2 Status change yields new response PASS PASS
3 Shutdown exits stream PASS PASS
4 Client cancel handled cleanly PASS PASS
5 Unknown service: SERVICE_UNKNOWN, no context.set_code PASS PASS
6 No duplicate sends on stable status PASS PASS
7 Shutdown edge case (graceful_exit poll / shutdown overrides healthy) PASS PASS

vLLM E2E Watch deferred -- requires vllm-project/vllm#38016 to register grpc.health.v1.Health in the gRPC server.

Checklist
  • cargo +nightly fmt passes (no Rust changes)
  • cargo clippy --all-targets --all-features -- -D warnings passes (no Rust changes)
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

  • Chores

    • Updated package version to 0.6.0.
    • Added test dependencies: pytest, pytest-asyncio, pytest-timeout.
  • New Features

    • Consolidated continuous health-streaming behavior across health servicers for more consistent status reporting.
  • Refactor

    • Factored shared health-watch streaming logic into a reusable component used by multiple servicers.
  • Tests

    • Added test configuration and fixtures for gRPC health tests.
    • Added comprehensive async tests covering streaming, shutdown, cancellation, and unknown-service handling.

@V2arK V2arK marked this pull request as draft March 26, 2026 17:05
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 26, 2026

📝 Walkthrough

Walkthrough

Adds a reusable HealthWatchMixin implementing the gRPC Health Watch streaming RPC, refactors SGLang and vLLM health servicers to use the mixin (removing their in-class Watch implementations), bumps package version and test extras, and introduces async tests and test fixtures for watch behavior.

Changes

Cohort / File(s) Summary
Package Configuration
grpc_servicer/pyproject.toml
Version bumped 0.5.20.6.0; new [project.optional-dependencies].test with pytest>=7.0, pytest-asyncio>=0.21, pytest-timeout>=2.0.
Health Watch Mixin
grpc_servicer/smg_grpc_servicer/health_watch.py
New HealthWatchMixin providing async server-streaming Watch, polling interval, shutdown coordination, and adapter _resolve_watch_status for sync/async per-servicer status computation.
SGLang Servicer
grpc_servicer/smg_grpc_servicer/sglang/health_servicer.py
Now inherits HealthWatchMixin; removed its inline Watch RPC; added module-level status constants and SCHEDULER_RESPONSIVENESS_TIMEOUT_S; calls _init_watch() and _notify_shutdown() and implements _compute_watch_status()/_is_shutting_down() logic.
vLLM Servicer
grpc_servicer/smg_grpc_servicer/vllm/health_servicer.py
Now inherits HealthWatchMixin; removed its inline Watch RPC; adds module-level status constants and implements _compute_watch_status() and _is_shutting_down() plus _init_watch()/_notify_shutdown() integration.
Test Fixtures
grpc_servicer/tests/conftest.py
New pytest conftest: injects MagicMock stubs for vllm.* and sglang.*, exposes health status constants, and adds grpc_context and request_msg fixtures.
Health Watch Tests
grpc_servicer/tests/test_sglang_health_watch.py, grpc_servicer/tests/test_vllm_health_watch.py
New async test suites covering Watch streaming: immediate emission, status transitions, shutdown notification, client cancellation, unknown-service behavior, and duplicate-suppression across polls.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant WatchMixin as HealthWatchMixin
  participant Servicer
  participant Backend as Scheduler/LLM
  participant ShutdownEvent as _watch_shutdown_event

  Client->>WatchMixin: Watch(request)
  WatchMixin->>Servicer: _resolve_watch_status(service)
  Servicer->>Backend: compute health (sync/async)
  Backend-->>Servicer: status (SERVING/NOT_SERVING/UNKNOWN)
  Servicer-->>WatchMixin: status
  WatchMixin-->>Client: yield HealthCheckResponse(status)
  loop poll until shutdown or cancel
    alt status changes
      WatchMixin-->>Client: yield updated HealthCheckResponse
    else no change
      WatchMixin->>ShutdownEvent: wait(timeout=WATCH_POLL_INTERVAL_S)
    end
    alt shutdown notified or cancelled
      WatchMixin-->>Client: (final status if needed) and end stream
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • CatherineSue
  • slin1237

Poem

🐇 Hopping through the Watch with a tiny hop and twitch,
I pulse the health and patch each little glitch.
When status shifts I whistle, when shutdown comes I slow,
I stream the truths, then gently go.
Tests tuck me in — carrot dreams aglow. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.70% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: implementing continuous Watch streaming for health servicers, which is the central objective of the entire pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added dependencies Dependency updates tests Test changes labels Mar 26, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a shared HealthWatchMixin to implement the gRPC Health Checking Protocol's Watch RPC for both SGLang and vLLM inference engines. The mixin provides a continuous streaming response that updates clients on health status changes or server shutdown. The PR also includes a version bump to 0.6.0, the addition of test-specific dependencies in pyproject.toml, and a comprehensive suite of unit tests. Feedback was provided regarding the use of inspect.isawaitable() for more robust detection of asynchronous results in the mixin's status resolution logic.

async def _resolve_watch_status(self, service_name: str) -> int:
"""Call _compute_watch_status, handling both sync and async impls."""
result = self._compute_watch_status(service_name)
if asyncio.iscoroutine(result):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For more robust detection of awaitable results from _compute_watch_status, it's better to use inspect.isawaitable() instead of asyncio.iscoroutine(). isawaitable() is more general and correctly handles not just coroutines from async def functions, but also other awaitable objects like asyncio.Future or custom objects with an __await__ method. This makes the mixin more resilient to different implementation patterns in subclasses.

You'll need to add import inspect at the top of the file.

Suggested change
if asyncio.iscoroutine(result):
if inspect.isawaitable(result):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted. Switched to inspect.isawaitable() -- it's the idiomatic pattern for this and handles the sync/async dispatch more robustly.

Note: the broader rationale about asyncio.Future and custom __await__ objects doesn't apply here since _compute_watch_status only ever returns int (sync) or a coroutine (async def), but isawaitable is still the right call.

@V2arK
Copy link
Copy Markdown
Contributor Author

V2arK commented Mar 26, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration
Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b35dd2b6-6b84-4827-892b-4ba6ea2d1214

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

✨ Finishing Touches
🧪 Generate unit tests (beta)

  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 26, 2026

@V2arK Sure, I'll review the PR now!


🧠 Learnings used
Learnt from: V2arK
Repo: lightseekorg/smg PR: 885
File: grpc_servicer/smg_grpc_servicer/vllm/health_servicer.py:0-0
Timestamp: 2026-03-24T18:53:58.838Z
Learning: In repo lightseekorg/smg, the single-yield Watch() pattern in grpc health servicers is intentional and consistent across both `grpc_servicer/smg_grpc_servicer/sglang/health_servicer.py` and `grpc_servicer/smg_grpc_servicer/vllm/health_servicer.py`. Kubernetes only uses the Check RPC, so Watch yields one status response then closes. A persistent-stream Watch is a known future improvement to be implemented for both servicers together in a follow-up PR. Do not flag the single-yield Watch as a protocol violation in these files.

Learnt from: V2arK
Repo: lightseekorg/smg PR: 885
File: grpc_servicer/smg_grpc_servicer/vllm/health_servicer.py:49-52
Timestamp: 2026-03-24T18:53:15.536Z
Learning: In repo lightseekorg/smg, `VllmHealthServicer` (grpc_servicer/smg_grpc_servicer/vllm/health_servicer.py) is intentionally only defined in smg — it is a library component. Instantiation, gRPC server registration, and `set_not_serving()` invocation during graceful shutdown are all wired in the consumer repo (vLLM's `vllm/entrypoints/grpc_server.py`, `serve_grpc()` function). This is the same pattern as `SGLangHealthServicer`, which is defined in smg but wired externally by the consumer. Do not flag missing server wiring in health_servicer.py as a bug.
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 012e5f6e16

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

self._watch_shutdown_event.wait(),
timeout=self.WATCH_POLL_INTERVAL_S,
)
except TimeoutError:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Catch asyncio.TimeoutError in Watch poll loop

Watch() currently catches built-in TimeoutError, but on Python 3.10 (which is supported via requires-python >=3.10) asyncio.wait_for() raises asyncio.TimeoutError instead. When a stream is healthy and no shutdown event occurs for one poll interval, that timeout escapes the loop and aborts the RPC rather than continuing to poll, so long-lived watch streams terminate unexpectedly in normal operation.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- this is a real bug on Python 3.10.

On 3.10, asyncio.TimeoutError inherits from Exception directly, NOT from the builtin TimeoutError. So except TimeoutError: misses it entirely, causing the poll loop to crash on every interval when no shutdown event fires. On 3.11+, they unified the two (asyncio.TimeoutError is builtins.TimeoutError), which is why this wasn't caught in local testing.

Fixed: except asyncio.TimeoutError: -- works on both 3.10 and 3.11+ since it's the same class on 3.11+.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@grpc_servicer/smg_grpc_servicer/sglang/health_servicer.py`:
- Around line 149-176: Extract the hard-coded 30s timeout into a class-level
constant (e.g., SCHEDULER_RESPONSIVENESS_TIMEOUT_S) and replace the literal 30
in both _compute_watch_status and Check with
self.SCHEDULER_RESPONSIVENESS_TIMEOUT_S; add the constant to the class
definition, update the time_since comparison in _compute_watch_status and the
corresponding check in Check() to use that constant, and ensure any tests or
other methods referencing the 30s behavior use the new constant name.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 033c0e2b-a6dd-400c-b263-c6c6868ccb12

📥 Commits

Reviewing files that changed from the base of the PR and between cb8407f and 012e5f6.

📒 Files selected for processing (8)
  • grpc_servicer/pyproject.toml
  • grpc_servicer/smg_grpc_servicer/health_watch.py
  • grpc_servicer/smg_grpc_servicer/sglang/health_servicer.py
  • grpc_servicer/smg_grpc_servicer/vllm/health_servicer.py
  • grpc_servicer/tests/__init__.py
  • grpc_servicer/tests/conftest.py
  • grpc_servicer/tests/test_sglang_health_watch.py
  • grpc_servicer/tests/test_vllm_health_watch.py

V2arK added 9 commits April 11, 2026 13:51
Signed-off-by: Honglin Cao <Caohonglin317@hotmail.com>
TDD red phase: 7 tests for SGLangHealthServicer.Watch() continuous
streaming. 5 fail against current single-yield implementation.
Adds sglang MagicMock stubs to conftest to allow collection without
a full SGLang installation.

Signed-off-by: Honglin Zhu <honglin@nvidia.com>
Signed-off-by: Honglin Cao <Caohonglin317@hotmail.com>
TDD red phase: 7 tests for VllmHealthServicer.Watch() continuous
streaming. 3 fail (exits_on_shutdown, engine_failure, no_duplicate)
as expected; 4 pass against current single-yield stub. Also adds
vllm module stubs to conftest so tests collect without vLLM installed.

Signed-off-by: Honglin <honglin@nvidia.com>
Signed-off-by: Honglin Cao <Caohonglin317@hotmail.com>
Signed-off-by: Honglin Cao <Caohonglin317@hotmail.com>
_notify_shutdown() now also sets self._watch_notified_shutdown = True
so subclasses can detect explicit shutdown (via set_not_serving()) in
_is_shutting_down() independently of their engine-specific flags.

Signed-off-by: Honglin Cao <Caohonglin317@hotmail.com>
Signed-off-by: Honglin Cao <Caohonglin317@hotmail.com>
Signed-off-by: Honglin Cao <Caohonglin317@hotmail.com>
Signed-off-by: Honglin Cao <Caohonglin317@hotmail.com>
Signed-off-by: Honglin Cao <Caohonglin317@hotmail.com>
@V2arK V2arK force-pushed the feat/grpc-watch-continuous-stream branch from 012e5f6 to 9adb736 Compare April 11, 2026 17:51
@github-actions github-actions bot added the grpc gRPC client and router changes label Apr 11, 2026
- Use inspect.isawaitable() instead of asyncio.iscoroutine() for
  idiomatic sync/async dispatch in _resolve_watch_status()
- Fix Python 3.10 bug: catch asyncio.TimeoutError instead of builtin
  TimeoutError in Watch poll loop (they diverge on <3.11)
- Extract scheduler responsiveness timeout (30s) as class constant
  SCHEDULER_RESPONSIVENESS_TIMEOUT_S to eliminate magic number
- Update test conftest stubs to match current import graph

Signed-off-by: Honglin Cao <Caohonglin317@hotmail.com>
@V2arK V2arK marked this pull request as ready for review April 11, 2026 18:48
@V2arK V2arK requested a review from njhill as a code owner April 11, 2026 18:48
@V2arK
Copy link
Copy Markdown
Contributor Author

V2arK commented Apr 11, 2026

All three review items addressed in a79ac91:

  • inspect.isawaitable() replacing asyncio.iscoroutine() (gemini-code-assist)
  • except asyncio.TimeoutError fixing Python 3.10 bug (chatgpt-codex)
  • SCHEDULER_RESPONSIVENESS_TIMEOUT_S constant replacing magic 30 (coderabbitai)

Also updated test conftest stubs to match current upstream import graph.

14/14 tests passing.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a79ac91799

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +101 to +102
if self._is_shutting_down():
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Emit shutdown status before terminating Watch stream

This early return can drop a real status transition during shutdown: if set_not_serving() runs after status was computed for the current iteration but before this check executes, the stream exits immediately without sending the final NOT_SERVING update. In that race window, Watch clients only observe EOF and miss the health-state change event they rely on for routing decisions.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluated -- pushing back on this one.

The race requires set_not_serving() to fire between _resolve_watch_status() returning and _is_shutting_down() executing. In asyncio's single-threaded cooperative model:

  • SGLang: _compute_watch_status() is sync -- no yield point, race is impossible.
  • vLLM: async via await check_health(), but the window between the return of _resolve_watch_status() and the _is_shutting_down() check is a single Python statement with no await -- no coroutine switch can happen there.

Even in the theoretical case where shutdown lands between _resolve_watch_status yielding control (during check_health()) and the shutdown check: the client would see SERVING then EOF. Any conformant Watch client treats EOF as "server unavailable" -- the gRPC transport-level disconnect is the primary shutdown signal, not an in-band NOT_SERVING message.

Not worth the added complexity.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
grpc_servicer/smg_grpc_servicer/vllm/health_servicer.py (1)

61-97: 🧹 Nitpick | 🔵 Trivial

Factor status resolution into one helper before Check() and Watch() drift apart.

This file now has two copies of the same shutdown/service-dispatch/health-mapping logic. A shared helper returning just the status enum would keep the RPCs aligned, with Check() remaining responsible for context.set_code() / context.set_details() on unknown services.

Also applies to: 99-119

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grpc_servicer/smg_grpc_servicer/vllm/health_servicer.py` around lines 61 -
97, Extract the duplicated shutdown/service-dispatch/health-mapping logic into a
single async helper (e.g., async def _get_health_status(self, service_name) ->
int) that returns a health_pb2.HealthCheckResponse status enum; the helper
should check self._shutting_down, map OVERALL_SERVER/VLLM_SERVICE to awaiting
self.async_llm.check_health() and return SERVING or NOT_SERVING accordingly, and
return SERVICE_UNKNOWN for unknown services; update Check() and Watch() to call
this helper for the status resolution while keeping Check() responsible for
setting context.set_code()/set_details() when the helper returns
SERVICE_UNKNOWN.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@grpc_servicer/tests/test_vllm_health_watch.py`:
- Around line 99-109: The test test_watch_handles_client_cancel is closing the
generator with gen.aclose(), which raises GeneratorExit rather than triggering
asyncio.CancelledError handled by HealthWatchMixin.Watch(); update the test to
create an asyncio Task that awaits gen.__anext__ (or awaits the generator
consumption) and then cancel that Task to simulate a real client disconnect so
the Watch() CancelledError branch is exercised; refer to servicer.Watch and
HealthWatchMixin.Watch() when making the change (apply the same pattern to the
SGLang sibling test).

---

Outside diff comments:
In `@grpc_servicer/smg_grpc_servicer/vllm/health_servicer.py`:
- Around line 61-97: Extract the duplicated
shutdown/service-dispatch/health-mapping logic into a single async helper (e.g.,
async def _get_health_status(self, service_name) -> int) that returns a
health_pb2.HealthCheckResponse status enum; the helper should check
self._shutting_down, map OVERALL_SERVER/VLLM_SERVICE to awaiting
self.async_llm.check_health() and return SERVING or NOT_SERVING accordingly, and
return SERVICE_UNKNOWN for unknown services; update Check() and Watch() to call
this helper for the status resolution while keeping Check() responsible for
setting context.set_code()/set_details() when the helper returns
SERVICE_UNKNOWN.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 471b728e-4802-483f-b2bd-86e99ab42825

📥 Commits

Reviewing files that changed from the base of the PR and between 012e5f6 and a79ac91.

📒 Files selected for processing (8)
  • grpc_servicer/pyproject.toml
  • grpc_servicer/smg_grpc_servicer/health_watch.py
  • grpc_servicer/smg_grpc_servicer/sglang/health_servicer.py
  • grpc_servicer/smg_grpc_servicer/vllm/health_servicer.py
  • grpc_servicer/tests/__init__.py
  • grpc_servicer/tests/conftest.py
  • grpc_servicer/tests/test_sglang_health_watch.py
  • grpc_servicer/tests/test_vllm_health_watch.py

…ests

Replace aclose()-based cancel tests with Task.cancel() to exercise the
actual asyncio.CancelledError path inside Watch(), matching real gRPC
client disconnect behavior. aclose() only triggers GeneratorExit.

Signed-off-by: Honglin Cao <Caohonglin317@hotmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@grpc_servicer/tests/test_sglang_health_watch.py`:
- Around line 41-45: The test uses open-ended async for loops over
servicer.Watch which can hang if shutdown signaling breaks; replace these with a
bounded collector using asyncio.wait_for by creating a helper (e.g.,
_collect_watch) that runs the async iterator and accumulates response.status,
then call asyncio.wait_for(_collect_watch(servicer.Watch(request_msg,
grpc_context)), timeout=SHORT_TIMEOUT) in tests; update occurrences referencing
servicer.Watch and servicer.set_not_serving (lines around the current loops) to
use the helper so tests fail fast instead of hanging.

In `@grpc_servicer/tests/test_vllm_health_watch.py`:
- Around line 34-39: The streaming test uses servicer.Watch and can hang if the
stream never terminates; wrap the async collection in an explicit timeout using
asyncio.wait_for (or your existing async helper) so the test fails fast.
Specifically, change the block that iterates over async for response in
servicer.Watch(request_msg, grpc_context) (and similar blocks around
servicer.set_not_serving) to collect the responses inside an await
asyncio.wait_for(..., timeout=some_seconds) call (e.g., gather into a coroutine
that consumes one response then calls servicer.set_not_serving) so the test
raises on timeout instead of hanging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c7ae82c-fc06-453d-b6fa-03bc7efdb6b7

📥 Commits

Reviewing files that changed from the base of the PR and between a79ac91 and 73c24f2.

📒 Files selected for processing (2)
  • grpc_servicer/tests/test_sglang_health_watch.py
  • grpc_servicer/tests/test_vllm_health_watch.py

Comment on lines +41 to +45
async for response in servicer.Watch(request_msg, grpc_context):
received.append(response.status)
if len(received) == 1:
servicer.set_not_serving()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Bound stream-consumption waits to prevent indefinite test hangs on regressions.

These async for loops are open-ended and rely on side effects to terminate. If a future change breaks shutdown signaling, this module can hang instead of failing fast. Wrap stream collection in asyncio.wait_for(...) (or a shared helper) with a small timeout.

⏱️ Suggested pattern
+async def _collect_watch(servicer, request_msg, grpc_context, timeout_s=1.0):
+    async def _run():
+        items = []
+        async for response in servicer.Watch(request_msg, grpc_context):
+            items.append(response.status)
+        return items
+    return await asyncio.wait_for(_run(), timeout=timeout_s)

Then use _collect_watch(...) in each test instead of bare open-ended async for.

Also applies to: 63-66, 83-86, 120-123, 144-148, 172-175

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grpc_servicer/tests/test_sglang_health_watch.py` around lines 41 - 45, The
test uses open-ended async for loops over servicer.Watch which can hang if
shutdown signaling breaks; replace these with a bounded collector using
asyncio.wait_for by creating a helper (e.g., _collect_watch) that runs the async
iterator and accumulates response.status, then call
asyncio.wait_for(_collect_watch(servicer.Watch(request_msg, grpc_context)),
timeout=SHORT_TIMEOUT) in tests; update occurrences referencing servicer.Watch
and servicer.set_not_serving (lines around the current loops) to use the helper
so tests fail fast instead of hanging.

Comment on lines +34 to +39
async for response in servicer.Watch(request_msg, grpc_context):
received.append(response.status)
if len(received) == 1:
servicer.set_not_serving()

assert received[0] == SERVING
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add explicit timeout bounds around stream collection to avoid stuck CI jobs.

These tests currently depend on Watch termination side effects; if termination regresses, they can hang. Use asyncio.wait_for (directly or via helper) so failures surface deterministically.

Also applies to: 68-71, 91-94, 126-129, 147-151, 162-166

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grpc_servicer/tests/test_vllm_health_watch.py` around lines 34 - 39, The
streaming test uses servicer.Watch and can hang if the stream never terminates;
wrap the async collection in an explicit timeout using asyncio.wait_for (or your
existing async helper) so the test fails fast. Specifically, change the block
that iterates over async for response in servicer.Watch(request_msg,
grpc_context) (and similar blocks around servicer.set_not_serving) to collect
the responses inside an await asyncio.wait_for(..., timeout=some_seconds) call
(e.g., gather into a coroutine that consumes one response then calls
servicer.set_not_serving) so the test raises on timeout instead of hanging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Dependency updates grpc gRPC client and router changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant